A Study on Text Mining over Hadoop Framework ‖

نویسندگان

  • Meghna Utmal
  • Ajay Kumar Gupta
چکیده

In today’s scenario as data is increasing day by day so text data mining approaches are playing a vital role in extracting many potential information and association from a large amount of text data. The term data mining is used for methods that analyze data and data mining deals with structured data, whereas text mining presents different formats that are unstructured or semi-structured data. With the advent of modernization, vast amounts of structured and unstructured data are being produced daily by a large number of business houses. Big Data is difficult to work with and requires massively parallel software running on a large number of computers. One platform that provides this paradigm is Hadoop. This platform has two major components: the Hadoop Distributed File System (HDFS) and MapReduce programming paradigm. Both components provide an environment with integrity, performance, availability, and scalability. Keywords-Big data; Text Mining, Hadoop; Hadoop Distributed File System; Map Reduce.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Survey on Information Retrieval and Pattern Matching for Compressed Data Size using the SVD Technique on Real Audio Dataset

Due to increasing size of text and audio data over internet, various techniques are needed to help with the finding and extraction of very specific information relevant to a user's task. Text mining is a variant on a field called data mining that tries to discover curious patterns from large databases. Singular value decomposition this technique is used for dimensionality reduction of large dat...

متن کامل

Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media

A public dataset, with a variety of properties suitable for sentiment analysis [1], event prediction, trend detection and other text mining applications, is needed in order to be able to successfully perform analysis studies. The vast majority of data on social media is text-based and it is not possible to directly apply machine learning processes into these raw data, since several different pr...

متن کامل

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

Parallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment

Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...

متن کامل

Distributed Framework for Data Mining As a Service on Private Cloud

Data mining research faces two great challenges: i. Automated mining ii. Mining of distributed data. Conventional mining techniques are centralized and the data needs to be accumulated at central location. Mining tool needs to be installed on the computer before performing data mining. Thus, extra time is incurred in collecting the data. Mining is 4 done by specialized analysts who have access ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015